Data exploration with self-organizing maps in environmental informatics and bioinformatics

نویسنده

  • Mikko T. Kolehmainen
چکیده

Kopijyvä Kuopio 2004 Finland Kolehmainen, Mikko T. Data exploration with self-organizing maps in environmental informatics and bioinformatics. ABSTRACT The aim of this thesis was to evaluate the usability of self-organizing maps and some other methods of computational intelligence in analysing and modelling problems of environmental informatics and bioinformatics. The concepts of environmental informatics, bioinformatics, computational intelligence and data mining are first defined. There follows an introduction to the data processing chain of knowledge discovery and the methods used in this thesis, namely linear regression, self-organizing maps (SOM), Sammon’s mapping, U-matrix representation, fuzzy logic, c-means and fuzzy c-means clustering, multi-layer perceptron (MLP), and regular-ization and Bayesian techniques. The challenges posed by environmental processes and bioprocesses are then identified, including missing data problems, complex lagged dependencies among variables, non-linear chaotic dynamics, ill-defined inverse problems, and large search space in optimization tasks. The works included in this thesis are then evaluated and discussed. The results show that the combination of SOM and Sammon’s mapping has great potential in data exploration, and can be used to reveal important features of the measurement techniques (e.g. separability of compounds), reveal new information about already studied phenomena, speed up research work, act as a hypothesis generator for traditional research, and supply clear and intuitive visualiza-tion of the environmental phenomenon studied. The results of regression studies show, as expected , that the MLP network yields better estimates in predicting future values of airborne pollutant concentration of NO 2 compared with SOM based regression or the least squares approach using periodic components. Additionally, the use of local MLP models is shown to be slightly better for estimating future values of episodes compared with one MLP model only. However, it can be concluded in general that the architectural issues tested are not able to solve solely model performance problems. Finally, recommendations for future work are laid out. Firstly, the data exploration solution should be enhanced with methods from signal processing to enable the handling of measurements with different time scale and lagged multivariate time-series. The main suggestion, however , is to create an integrated environment for testing different hybrid schemes of computational intelligence for better time-series forecasting in environmental informatics and bioinformatics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

. O . Box 5400 FIN - 02015 HUT , Finland

The aim of this thesis was to evaluate the usability of self-organizing maps and some other methods of computational intelligence in analysing and modelling problems of environmental informatics and bioinformatics. The concepts of environmental informatics, bioinformatics, computational intelligence and data mining are first defined. There follows an introduction to the data processing chain of...

متن کامل

Green Product Consumers Segmentation Using Self-Organizing Maps in Iran

This study aims to segment the market based on demographical, psychological, and behavioral variables, and seeks to investigate their relationship with green consumer behavior. In this research, self-organizing maps are used to segment and to determine the features of green consumer behavior. This was a survey type of research study in which eight variables were selected from the demographical,...

متن کامل

Data Clustering and Topology Preservation Using 3D Visualization of Self Organizing Maps

The Self Organizing Maps (SOM) is regarded as an excellent computational tool that can be used in data mining and data exploration processes. The SOM usually create a set of prototype vectors representing the data set and carries out a topology preserving projection from high-dimensional input space onto a low-dimensional grid such as two-dimensional (2D) regular grid or 2D map. The 2D-SOM tech...

متن کامل

A novel representation of genomic sequences for taxonomic clustering and visualization by means of self-organizing maps

MOTIVATION Self-organizing maps (SOMs) are readily available bioinformatics methods for clustering and visualizing high-dimensional data, provided that such biological information is previously transformed to fixed-size, metric-based vectors. To increase the usefulness of SOM-based approaches for the analysis of genomic sequence data, novel representation methods are required that automatically...

متن کامل

oposSOM: R-package for high-dimensional portraying of genome-wide expression landscapes on bioconductor

MOTIVATION Comprehensive analysis of genome-wide molecular data challenges bioinformatics methodology in terms of intuitive visualization with single-sample resolution, biomarker selection, functional information mining and highly granular stratification of sample classes. oposSOM combines those functionalities making use of a comprehensive analysis and visualization strategy based on self-orga...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004